ENVST 206: Environmental Data Science, Hamilton College

Instructions

There are 10 questions to this activity. Save your answers in word document that you will hand in on Blackboard using a .pdf or .docx extension. Save your file with your last name first in the filename to help me keep track of files for grading. Keep your script file in your GitHub folder and make sure that all changes are pushed to GitHub. Code for all questions should be clearly commented with the question number. You will include a link to your script file as a part of your final question in this activity. The assignment is worth 30 points. Include any plots you make for a question in your answers.

Learning objectives

1. Learn to work with rasters in R

2. Learn about working with remote sensing data

3. Summarize and visualize raster data

What are Harmful Algal Blooms (HAB)?

Harmful Algal Blooms (HABs) occur when the conditions in a body of water allow for algae to rapidly grow. Some types of algae release toxins into the water as they grow resulting in HABs. HABs contaminate freshwater resources posing a public health concern. Swimming in a HAB may sicken people and consuming the water can sicken or kill animals and people. Algal blooms can also create dead zones in water bodies as the oxygen content of the water is lowered due to the abundance of photosynthestic activity. The presence of HABs can be observed with discolation of the water that may be blue-green, red, or brown and a scummy film on the water. Not all algae blooms may be harmful, but many health departments regularly test water to check for HABs especially when a bloom is present.

Source: Environmental Protection Agency

The discoloration of bodies of water from algal blooms can often be observed from satellites when the blooms are large enough. Below is a satellite image of algae in the western side of Lake Erie:

Source: NASA

HABs often arise when conditions allow for higher levels of sunlight, slow moving water, and there are large inputs of nutrients like nitrogen and phosophorus. Residential and agriculutural areas often spread high amounts of fertilizer (contain nitrogen and phosphorus) to promote plant growth. These nutrients can get carried into bodies of water with runoff from rainfall. HABs are a particular concern in many lakes in upstate New York where beaches can close and warnings are issued to the public. In 2019, Verona Beach on Onieda Lake (Northeast of Syracuse) was frequently closed due to HABs. We will use satellite imagery to map this algal bloom and take a closer look at the landcover surronding the area. The presence of algae alone does not indicate that the type of algae may be the type that produces harmful toxins, but tracking algae can be a key step in thinking about lake health and planning testing for toxins.

Satellite imagery

You will look at the Oneida Lake area using images collected from the European Space Agency Copernicus Sentinel-2 satellite. Full documentation can be found at https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi and sentinel-2 collects data for 13 spectral bands ar differing spatial resolutions: four bands at 10 meters, six bands at 20 meters and three bands at 60 meters spatial resolution. You will work with all bands at 20 meters spatial resolution which means some bands have been aggregated to a courser spatial resolution and others have been downscaled to a finer spatial resolution. We will be working with the level 2A data that has the highest level of quality control and digital numbers represent bottom of atmosphere reflectance. These files are freely available online and typically cover a much larger geographic area. I have cropped the original files so that you can work with smaller files focused on the Oneida Lake area. You can find these files in the sentinel folder within the data folder shared with you. We will be looking at images taken on August 14 2019 for 4 bands, band 2, 3, 4 and 8.

Work with Raster data

You will need the raster package to work with raster data in R. You will need to first install the package and then load it into your workspace. Let’s also load ggplot2 to the environment.

install.packages(c("raster"))
library(raster)
library(ggplot2)
library(rgdal)

Now let’s read in the data. You will use the raster function to load these images to you environment. Since all of these rasters are in the same folder, I will create an object that declares the full file path in R. This means that I could move the folder and change only a single file path rather than every individual file. Writing the file path in your code also helps you keep track of data management and fully reproduce your analysis.

#set up directory for oneida data folder
dirR <- "/Users/hkropp/Google Drive/teaching/2020/Fall 2020/EnvDataSci/activity/data/activity 8/oneida"

#read in Sentinel data

rdatB2 <- raster(paste0(dirR,"/sentinel/T18TVN_20190814T154911_B02_20m.tif"))
rdatB3 <- raster(paste0(dirR,"/sentinel/T18TVN_20190814T154911_B03_20m.tif"))
rdatB4 <- raster(paste0(dirR,"/sentinel/T18TVN_20190814T154911_B04_20m.tif"))
rdatB8 <- raster(paste0(dirR,"/sentinel/T18TVN_20190814T154911_B08_20m.tif"))

Sentinel 2A data is stored such that each band of light is in its own raster are stored with the value of that raster pixel equal to the reflectance/10,000. Let’s take a quick look at the raster values. Let’s start by looking at the blue light reflected over the Oneida Lake area. Here, the plot function is also used for mapping raster data, similar to the vector data in our last exercise. We can also divide the entire raster by 10,000 to look at the reflectance in our typical proportional value. If you refer to the picture with the reflectance spectrum above, you will notice that band 2 corresponds to blue light, 3 green, 4 red, and 8 is Near Infrared.

plot(rdatB2/10000)

You will notice that there are a few cells with values over one. This can happen with remote sensing data due to measurement errors or light getting reflected from the atmosphere or other surfaces that are not accounted for when we consider incoming light from the sun. Looking at single bands of light can be difficult to get an idea of the surface that we are looking at. We want to composite (overlay) the red, green, and blue bands of light to create a true color image. The plotRGB function will create this color composite. The main arguments that you need are to declare the red, green, and blue bands of light you will be plotting in that order. An easy way to specify that our red, green, blue rasters will be to create a stack of rasters. A stack is a set of overlaping rasters that describe the same geographical area, and we can easily reference all of our rasters together. We can also perform an operation on all of the rasters together, like dividing by 10,000 to convert to actual reflectance from the digital number.

#stack red green blue
rgbS <- stack(rdatB4,rdatB3,rdatB2)/10000
#view raster, a few pixels in blue have reflectanc above 1 so set scale
plotRGB(rgbS, scale=2)

You’ll notice that most of our image looks black and we can’t see much. That’s because we have values of high reflectance like the clouds and the few problem pixels in the blue band that have a reflectance above one. We might want to filter those out in a more formal analysis, but for this class we can ignore that data. However, it is problematic since most of the values are a mucher lower reflectance, we only see black. We can add a contrast stretch so that the colors are displayed to optimize contrast between the values that occur most frequently.

#don't need the scale argument when adding in the contrast stretch
plotRGB(rgbS, stretch="lin")

The other aspect that you may notice if you zoom in is that some of the areas look pixelated. This should not be the case given each pixel on land is 20 x 20 meters. We should be seeing a higher resolution photo. If you look at the documentation for plotRGB, you’ll notice that the plotting function will lower the resolution of the photo to only show 0.5 million pixels in the plot. You can change this by specifying the maximum number of pixels to plot.

#full resolutions 
#get the total number of pixels by multiplying the number of rows and columns
#in the raster
plotRGB(rgbS, stretch="lin",maxpixels=rgbS@nrows*rgbS@ncols)

If you have any form of color blindness, your answer may be impacted for answer 4. Describe what you see and your perception of the two maps. There is no right or wrong answer. I evaluate your answer to this question based on your ability to thoroughly describe both maps and interpret you perception of the landscape.

Analyzing raster data

Next you will focus on learning to work with raster data for data analysis and summary. We’ve learned that we can easily apply a calculation to all of the cells in a raster when you converted the digital number to reflectance. You can also conduct calculations between rasters with matching spatial extents. You use a similar format and type the mathematical operations with the names of the rasters. We can use this to calculate the Normalized Difference Vegetation Index for each cell. This is often called raster math in Geographical Information Systems.

#calculate NDVI
#NIR-red/(NIR + RED)
NDVI <- (rdatB8 - rdatB4) / (rdatB8 + rdatB4)
#visualize NDVI across the Oneida lake area
plot(NDVI)

Another useful operation with rasters is to focus the raster values at particular points of interest. You can see across the landscape there are different types of landcover such as a highway, forest, and farms. I’ve generated point shapefiles that include points located in different types of landcover classes. Let’s take a look at the reflectance of different bands of light across these landcover classes. First you will have to extract the values in the raster for the cell that contains the point.

#read in landcover points data
#you may have to change your slash direction if you are on a windows computer
#I've also turned off the info print out here when you read in the file
algae <- readOGR(paste0(dirR,"/Oneida/algae.shp"), verbose=FALSE)
agri <- readOGR(paste0(dirR,"/Oneida/agriculture.shp"), verbose=FALSE)
forest <- readOGR(paste0(dirR,"/Oneida/forest.shp"), verbose=FALSE)
water <- readOGR(paste0(dirR,"/Oneida/water.shp"), verbose=FALSE)
wetlands <- readOGR(paste0(dirR,"/Oneida/wetlands.shp"), verbose=FALSE)

Let’s take a look at where these points are on the map:

#plot points and true color
plotRGB(rgbS, stretch="lin",maxpixels=2297430)
plot(algae, add=TRUE, col=rgb(0.5,0.5,0.5,0.5), pch=19)
plot(agri, add=TRUE, col=rgb(0.75,0.5,0.5,0.5), pch=19)
plot(forest, add=TRUE, col=rgb(0.75,0.75,0.25,0.5), pch=19)
plot(water, add=TRUE, col=rgb(0.33,0.75,0.75,0.5), pch=19)
plot(wetlands, add=TRUE, col=rgb(0.33,0.33,0.65,0.5), pch=19)
legend("bottomleft", c("algae","agri","forest","water","wetlands"),
       pch=19, col=c(rgb(0.5,0.5,0.5,0.5),rgb(0.75,0.5,0.5,0.5),rgb(0.75,0.75,0.25,0.5),rgb(0.33,0.75,0.75,0.5),rgb(0.33,0.33,0.65,0.5)),
                     bty="n", cex=0.75)

In order to extract the data, we need a data frame with the point coordinates. R will pull out the raster cell values at each coordinate. It will also be helpful to create an identifying column that indicates the landcover type of the point. The points all have coordinates stored in the _@coords_ section of the data. Here we’ll set up a dataframe with all of the info.

#set up a dataframe with all of the point coordinates
landExtract <-  data.frame(landcID = as.factor(rep(c("algae","water","agri","forest","wetland"),each=120)),
                        x=c(algae@coords[,1],water@coords[,1],agri@coords[,1],forest@coords[,1],wetlands@coords[,1]),
                        y=c(algae@coords[,2],water@coords[,2],agri@coords[,2],forest@coords[,2],wetlands@coords[,2]))

Now that you have your landcover dataframe set up with the coordinates of each point set up, you can get the data from the raster. First it will help to make a raster stack with all of the rasters we will want to look at and covert all of the data to reflectance. Next you will use the extract function that will create a dataframe with a column for each raster. Each column has all of the reflectance values in the cell that contained the point.

#stack all bands
allbands <-  stack(rdatB2, rdatB3, rdatB4,rdatB8)/10000
#add the raster reflectance values to the point coordinates and classes
#extract(raster, matrix of coordinates)
#raster:: helps ensure that extract comes from the raster package
ExtractOut <- raster::extract(allbands,landExtract[,2:3])
#name the bands
colnames(ExtractOut) <- c("B02","B03","B04","B08")
#combine the original data with the coordinates with the raster data
rasterEx <- cbind(landExtract,ExtractOut)
#look at data
head(rasterEx)
##   landcID        x       y    B02    B03    B04    B08
## 1   algae 435221.7 4781719 0.0325 0.0531 0.0283 0.0185
## 2   algae 435318.6 4782049 0.0319 0.0544 0.0288 0.0189
## 3   algae 434626.4 4782052 0.0298 0.0454 0.0245 0.0151
## 4   algae 434615.3 4781600 0.0280 0.0432 0.0231 0.0136
## 5   algae 435432.9 4781163 0.0321 0.0536 0.0286 0.0174
## 6   algae 435264.6 4781509 0.0330 0.0561 0.0297 0.0217

Looking at patterns in remote sensing data

A major utility in remote sensing data is to classify the land cover or objects on the landsurface. To understand how this can often be accomplished, let’s take a closer look at the reflectance of the different bands across the landcover class. Here, you can make a traditional scatterplot.

ggplot(data=rasterEx, aes(x=B02, y=B03, color=landcID))+
  geom_point(alpha=0.6)+
  theme_classic()

You can see that the different landcover class points tend to be clustered together close with similar ranges of values occuring within a landcover class. You can also see that there is little overlap between some classes. For example, water with algae in it always has higher blue and green reflectance than open water with no algae. A really useful application of remote sensing is to take advantage of these differences in reflectance and use these known differences between landcover classes to areas we can’t directly measure (think about trying to describe the millions of points in this picture alone!).